# Efficient Vision-Language Model
Openfly Agent 7b
MIT
OpenFly is a platform for aerial vision-language navigation, providing a multi-functional toolchain and large-scale benchmarking.
Multimodal Fusion
Transformers English

O
IPEC-COMMUNITY
234
0
Xgen Mm Vid Phi3 Mini R V1.5 128tokens 8frames
xGen-MM-Vid (BLIP-3-Video) is an efficient compact vision-language model equipped with an explicit temporal encoder, specifically designed for video content understanding.
Video-to-Text
Safetensors English
X
Salesforce
398
11
Nanollava
Apache-2.0
nanoLLaVA is a 1B-parameter vision-language model specifically designed for edge devices, featuring efficient operation.
Text-to-Image
Transformers English

N
qnguyen3
2,851
154
Featured Recommended AI Models